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Abstract — Different types of convolution operations involving 
large Vandermonde matrices are considered. The convolutions 
parallel those of large Gaussian matrices and additive and 
multiplicative free convolution. First additive and multiplicative 
convolution of Vandermonde matrices and deterministic diagonal 
matrices are considered. After this, several cases of additive 
and multiplicative convolution of two independent Vandermonde 
matrices are considered. It is also shown that the convergence of 
any combination of Vandermonde matrices is almost sure. We will 
divide the considered convolutions into two types: those which 
depend on the phase distribution of the Vandermonde matrices, 
and those which depend only on the spectra of the matrices. A 
general criterion is presented to find which type applies for any 
given convolution. A simulation is presented, verifying the results. 
Implementations of all considered convolutions are provided 
and discussed, together with the challenges in making these 
implementations efficient. The implementation is based on the 
technique of Fourier-Motzkin elimination, and is quite general as 
it can be applied to virtually any combination of Vandermonde 
matrices. Generalizations to related random matrices, such as 
Toeplitz and Hankel matrices, are also discussed. 

Index Terms — Vandermonde matrices, Random Matrices, con- 
volution, deconvolution, limiting eigenvalue distribution. 



I. Introduction 

Certain random matrices have in the large dimensional 
limit a deterministic behavior of the eigenvalue distributions, 
meaning that one can compute the eigenvalue distributions 
of AB and A + B based only on the individual eigenvalue 
distributions of A and B, when the matrices are independent 
and large. The process of computing theses eigenvalues is 
called convolution, or de-convolution when one would like to 
compute the inverse operation. Gaussian-like matrices fit into 
this setting, and the concept which can be used to find the 
eigenvalue distribution from that of the component matrices 
in this case is called freeness [1]. Free probability theory 0, 
which uses the concept of freeness, is not a new tool but has 
grown into an entire field of research since the pioneering 
work of Voiculescu in the 1980's (0, 0, 0, 0). However, 
the basic definitions of free probability are quite abstract and 
this has hinged a burden on its actual practical use. The 
original goal was to introduce an analogy to independence 
in classical probability that can be used for non-commutative 
random variables like matrices. These more general random 
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variables are elements of what is called a noncommutative 
probability space. The convolution/deconvolution techniques 
used are various. The classical ones are either analytic (using 
R and S transforms 0, 0) or based on moments 0, 0, 
1 9 1, [ 1 1 . Recent deconvolution techniques based on statistical 
eigen-inference methods using large Wishart matrices ifTTI . 
random Matrix theory lfl2l or other deterministic equivalents a 
la Girko lfl3ll . ifPfl . OH, lfl6l were proposed and are possible 
alternatives. Each one has its advantages and drawbacks. 
Unfortunately, although successfully applied ifTTIl . p8) , all 
these techniques can only treat very simple models i.e. the case 
where one of the considered matrices is unitarily invariant. 
This invariance has a special meaning in wireless networks 
and supposes that there is some kind of symmetry in the 
problem to be analyzed. The moments technique, which will 
be the focus of this work, is very appealing and powerful 
in order to derive the exact asymptotic moments of "non- 
free matrices", for which we still do not have a general 
framework. It requires combinatorial skills and can be used 
for a large class of random matrices. The main drawback of 
the technique (compared to other tools such as the Stieltjes 
transform method |19|) is that it can rarely provide the exact 
eigenvalue distribution. However, in many applications, one 
needs only a subset of the moments depending on the number 
of parameters to be estimated. 

Recently 11201 . Vandermonde matrices (which do not fall 
within the free probability framework) were shown to be 
a case of high interest in wireless communications. Such 
matrices have various applications in signal reconstruction 
||2TI . cognitive radio 11221 . physical layer security [23], and 
MIMO channel modeling ll24l . A Vandermonde matrix with 
entries on the unit circle is on the form 



/ 1 



V 



1 



V 



a _j(jV-l)wi . . . -j(N-l)u L 



(1) 



V will in this paper always denote a Vandermonde matrix, 
and its dimension will be denoted N x L. The u>i,...,ujl, also 
called phase distributions, will be assumed i.i.d., taking values 
in [0,27r). We will also assume, as in many applications, 
that N and L go to infinity at the same rate, and write 
c = lirriAr^oo -fe for the aspect ratio. If necessary, we will write 
V u to emphasize the actual phase distribution, or V uc to also 
emphasize the aspect ratio. In ||20| , the limit eigenvalue distri- 
butions of combinations of V H V and diagonal matrices D(iV) 
were shown to be dependent on only the limit eigenvalue 
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distributions of the two matrices. Important combinations are 
the multiplicative and additive models, 

D(A)V ff V and D(N) + V H V. (2) 

In the large TV-limit, (O thus gives rise to two convolution 
operations, 

1) limjv^oo D(iV)V^V and ^^^(D^) + 
V H V), 

which thus depend only on the input spectra. Here lim is 
used to denote the limit of the eigenvalue distribution of the 
considered matrix, in an appropriate metric. However, it is 
not clear from |20| how 1) can be computed algorithmically, 
as only sketches for this were provided. We also have the 
operations 

2) limjv^oo T>(N)W H and limj V ^ 00 (D(A^) + 
VV H ), 

for which it is unknown whether the result only depends on the 
spectra. This case happens in practical scenarios (for cognitive 
applications 11221 as well as secure transmissions [23]) when a 
Vandermonde precoder V is used in a given Toeplitz channel 
matrix D(A) independent from V. One can then compute 
cognitive and secrecy rates. When we replace with independent 
Vandermonde matrices Vi and V2 which may or may not 
have the same phase distributions, it is also unknown if the 
convolution operations 

3) limjv^oo Vf ViVf V 2 and lim A r^ 00 (Vf Vi + 
VfV a ), 

4) Hindoo VxVf V 2 Vf and lim A r^ 00 (V 1 Vf + 

v 2 vf), 

only depend on the spectra of Vi and V2. These cases 
are important for the recovery of the distribution of sensors 
(which are deployed in a clustered manner with different mean 
positions) and in the case of MIMO multi-fold scattering |25|. 

Expressions such as 4), when different types of matrices are 
multiplied, will in the following be called mixed moments. 

In this contribution we explain which of the above op- 
erations depend only on the spectra of the matrices, state 
expressions for those convolutions (in fact, we also state 
expressions for the cases where the result can not be written 
in terms of the spectra), explain how these expressions have 
been obtained algorithmically, and explain an accompanying 
software implementation (26), ||27l of the corresponding al- 
gorithms. We also attempt to complete the analysis started 
in ED I, by stating a very general criterion for when the mixed 
moments of (many) Vandermonde matrices and deterministic 
matrices depend only the input spectra: 

If there are no terms on the form V^V 2 in a mixed 
moment, with Vi and V 2 independent and with different 
phase distributions, the mixed moment will depend only on 
the spectra of the input matrices. In all other cases, we 
can't expect dependence on just the spectra of the input 
matrices, and the mixed moment can depend on the entire 
phase distributions of the input matrices. 

The software implementation can in fact be extended to 
handle all cases which meet this criterion, as well as cases 
where knowledge of the phase distribution also is required. In 
this way it is an indispensable tool, as it automates the very 



tedious computations inherent in the presented formulas, for 
which no simple expressions are known. 

Concluding from the criterion, 1) will depend only on the 
spectra (as shown in ll20l ). as does 3). 4) may not depend on 
only the spectra when the two phase distributions are different. 
Despite this, 4) is interesting in its own right, since it has a 
geometric interpretation in terms of phase distributions, and 
is therefore handled separately. For case 2), we state more 
generally that when the pattern D(A)V appears in a mixed 
moment, we can not expect dependence only on the spectrum. 

It turns out that other types of random matrices can use 
the same methods as for Vandermonde matrices to compute 
their moments, such as Toeplitz matrices and Hankel matrices. 
We will explain how the software implementation has been 
extended to handled these matrices as well. 

The paper is organized as follows. Section [TT] provides 
background essentials on random matrix theory needed for 
the main results, which are stated in [Til] The results include 
the precise statement of the criterion above for when we only 
have dependence on the spectra of the matrices, results on the 
convolution operations l)-4), and extensions to related random 
matrices such as Toeplitz and Hankel matrices. A general- 
ization of our results to almost sure convergence of matrices 
is also made. All presented formulas are obtained from the 
implementation, and the major pieces in this implementation 
are gone through in Section |IV] such as partition iteration, 
and Fourier-Motzkin elimination |28|. Section [V] presents a 
simulation which verifies the results. 

II. Random matrix Background Essentials 

In the following, upper (lower boldface) symbols will be 
used for matrices (column vectors), whereas lower symbols 
will represent scalar values, (.) T will denote the transpose 
operator, (.)* conjugation, and {.) H = ((.) T ) hermitian 
transpose. 1^ will represent the L x L identity matrix. We let 
Tr be the (non-normalized) trace for square matrices, defined 

by, 

L 

Tr(A) = y a u , 

i=l 

where an are the diagonal elements of the L x L matrix A. 
We also let tr be the normalized trace, defined by tr(A) = 
iTr(A). 

In the following we will implicitly assume that L and N go 
to infinity in such a way that -4 — > c. D r (N) , 1 < r < n will 
denote non-random diagonal Lx L matrices. We will have use 
for the following definition: 

Definition 1: We will say that the {D,,(A^)} 1 <,,<„ have a 
joint limit distribution as N — > 00 if the limit 

I), lim tr(D n (A)---D ls (A)) (3) 

N— >oo 

exists for all choices of i\, i s £ {1, .., n}. 

A joint limit distribution for the D r (A) will always be 
assumed in the following. The corresponding concept for 
random matrices is the following: 

Definition 2: Let {A n }^ =1 be an ensemble of (square) ran- 
dom matrices. We say that {A n }^ =1 converge in distribution 
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if the limit 



lim E[tr((A„) r )] 



(4) 



exists for all r. We will say that ensembles {A ln , A 2 „, ■■■}^Li 
of random matrices converge in distribution if the limit 



lim E[tr(A il „A i . 



>)] 



(5) 
is well- 



exists whenever the matrix product Aj in Ai 2 „ ■ ■ ■ A 
defined, and square. 

When we refer to moments, we will generally mean ©, 
while mixed moments refer to (f5]). A stronger form of conver- 
gence, which we will generalize our results to, is almost sure 
convergence in distribution. This type of convergence requires 
that (@), (0 are replaced with 

tr((A„) r ) a 4-CV 



ti (Ai lfl Ai 2ll • • • A 



where C r ,Ci lt ... t i B are constants. 

We will also need some basic concepts from partition theory. 
7(n) will denote the partitions of {l,...,n}. For a partition 
p = {W u ...,W r } S 7{n), W%, W r denote its blocks, 
while \p\ — r denotes the number of blocks, \\p\\ = n the 
number of elements in the partition. We will write k ^ p I 
when k and I belong to the same block of p. We will also 
write b(i) for the index of the block in p i belongs to. Partition 
notation is adapted to the mixed moment (f3]i in the following 
way: 

Definition 3: For p = {Wi, Wk}, with W% — 
{wn, Wi\ Wi \}, we define 

D Wi = D,_. (6) 



D P = 



Wi- 



The set of partitions is a partially ordered set under the 
refinement order, i.e. pi < p 2 whenever any block of pi is 
contained within a block of pi- By pi V P2 we will mean 
the smallest partition (w.r.t. the refinement order) which is 
larger than both p\ and p2- V will in our results be used in 
conjunction with the partition [0, l] n £ y(2n), defined by 

[0,l]„ = {{l,2},{3,4},...,{2n-l,2n}}. 

[0,1] „ is an example of what is called an interval partition, 
meaning that each block consists solely of successive numbers. 
We will also write [•, •] for the intervals in an interval partition, 
so that we could also have written 

[0,l]„ = {[l,2],[3,4],...,[2n-l,2n]}. 

We will in the following consider the trace of a general 
mixed moment of Vandermonde matrices and deterministic 
matrices, the only requirement being that matrices and their 
adjoints appear in alternating order so that the resulting matrix 
is square: 



tr (DxWVgVfc • ■■■D n (N)Vf 2n _V l 



(8) 



where Vi,V2,... are assumed independent and with phase 
distributions u>i,u>2, ■■■■ In particular, we assume that N 2k = 



iVi 2fc _ 1 when the V$ are Ni x L, in order for the dimensions 
of the matrices in (HJ to match. It turns out we can obtain 
the asymptotic behavior of ^ for arbitrary continuous phase 
distributions tOj. For we will let <? t> e the partition in CP(2n) 
defined by equality of the phase distributions, i.e. j ^ a k if and 
only if LOi — LOi k (ij and if. may or may not be different for 
this). Similarly we will let u\ be the partition in CP(2n) defined 
by dependence of the Vandermonde matrices, i.e. j k if 
and only if ij = i^. Obviously, o\ < a. 

III. Statement of main results 

The main result of the paper addresses moments on the form 
©, and goes as follows. 

Theorem 1: Let Vj be independent Ni x L Vandermonde 
matrices with aspect ratios Cj = liiriAr.^oo jj- and phase 
distributions u>i with continuous densities on [0,27r). The 
mixed moment 



limta (D!(JV)V^V i2 ■ ■■U n (N)Vf 2n _ i y i 



(9) 



always exists when Dj(iV) have a joint limit distribution. 
When a > [0, l] n (i.e. there are no terms on the form 



Vj , with Vi and Vj independent and with different phase 
distributions), (0 depends only on the moments 



V, 



(i) 



lim E 

N^oo 



tr 



(vfv, 



I),,. = lim tT(D il {N)---T> i .(N)), 

N— *oo 

the aspect ratios c iy and a, and assumes the form 

r 

.,J 1 ,...,Jr,fcl,...,fcrAl,...,» i \\ V 3 



s,r,i t ,jt,kt 



it ' 



(10) 



(7) where the a, 



,31, 



,3tM 



are rational numbers. 



Theorem Q] is proved in Appendix [A] and states exactly 
when we can hope for performing deconvolution, either by 
inferring on the spectrum of Di(iV), or on the spectrum or the 
phase distribution of V* from (|9). The proof will also state 
concrete expressions for the mixed moments which parallel 
the expressions of [20|, and also summarize the algorithm 
needed to compute these expressions, as performed by the 
implementation. The implementation is thus moment-based, 
in that it computes the moments as defined in (|4), from the 
moments of the input matrices. We do not know any other 
methods than that of moments to infer on the spectra of such 
matrices, since other analytical tools have not been developed 
yet. 

As an example, Theorem Q] states that 



tr 



Vi 



) ff (V! 



))' 



(ID 



which characterize the singular law of a sum of independent 
Vandermonde matrices, depend only on the moments when the 
Vi are independent with the same phase distribution. When 
the phase distributions are different, however, the same can 
not be said. The final observation in Theorem Q] about the 
polynomial form of the mixed moment is also important, since 
it is a property shared with freeness. Although ( TTOb is seen 
not to be multi-linear in the moments in general, several of 
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the particular convolutions we consider will be seen to have 
such a multi-linearity property. 

In the following, we state expressions for the convolutions 
l)-4) on the form ( TTOb . Their proofs will be apparent from 
the proof of Theorem [T] and can be found in Appendix [B] 
The aspect ratio c will be handled in a particular way in 
these results, so that it is applied outside the algorithm itself. 
The results are stated so that it is possible to turn them 
around for "deconvolution": for instance, from the moments 
of D(7V)V ff V, one can infer on the moments of T>(N). 
The application of the theorems in terms of deconvolution is 
certainly as important as the limit results themselves, since 
it enables us to infer on the parameters in an underlying 
model (here represented by D(iV) and V). The accompanying 
implementation of this paper also supports deconvolution. 

As for the convolutions 2), this form is not compatible with 
the form © due to the placement of the D(AT). We will 
therefore not handle this operation, only state in Appendix iBl 
why one in this case can't expect that the result only depends 
on the spectra of D(iV) and V. 

All formulas in the following are generated by the accom- 
panying software implementation, which is gone through in 
Section [TV] Implementation details pertaining to the different 
convolutions are gone through in Appendix [B] Note that the 
software implementation is capable not only of generating the 
listed mathematical formulas for the convolutions, but also to 
perform the computations numerically, as would be needed in 
real-time applications. 



A. The convolutions \im.N— too 'D(N)\ rH *V and 
lim Jv ^ 00 (D(^)+V H V) 

In Theorem 1 of lEOl . the moments 
lirn/v— ►oo tr ((D(iV)V ff V) ) were expressed in terms 
of the integrals 

h.u = (2vr) fe - 1 / Pul {x)\ (12) 
Jo 



where c = lirriAr^oo 4. Then we have that 
Mx = Di 

M 2 = D 2 -Dl + DlV 2 
M 3 = D 3 - 3L> 2 £»i + 3£> 2 DiV2 
+2D\ - 3DfV 2 + DlV 3 

M 4 = D 4 - |d| + |d|K 2 - 4D 3 Di 

+4D3D1V2 + \2D 2 D\ - 18D 2 DjV 2 

+6D 2 DlV 3 -™D$ + ^D$V 2 

-6DfV 3 + DfVi 

where all coefficients are rational numbers. Also, whenever 

{M n }i<„< fc are known, and {V n }i< n <k(or {D„}i<„<fc) also 
are known, then {£>„}i<„<fc(or {V n }i< n <k) are uniquely 
determined. 

The proof of Theorem [2] can be found in Appendix [B] 
Restricting to uniform phase distribution we get the following 
result, also generated by the implementation. 

Corollary 1: When V has uniform phase distribution, we 
have that 

Mi = Di 

M 2 = D 2 + D\ 

M 3 = D 3 + ?>D 2 D X + D\ 

M 4 = Z> 4 + \dI + 4Z? 3 L>i + &D 2 D\ + D\ 

o 

The additive convolution in 1) can be split into sums of 
many terms similar to ( fl5l l. and for each term, the results 
of ll20l can be applied. We obtain the following result, also 
proved in Appendix IB] 

Theorem 3: Assume that has a phase distribution with con- 
tinuous density, 

M n = c Jim tr (jp(N) + V H V)") , 

where c = limjv^oo jf- With V n as in ( fT3l ) and D n as in ( fl4l ). 
we have that 



Puj being the density of the phase distribution. These again 
determine the moments of V ff V uniquely ((13) and (20) 
in ll20l ). so that, indeed, the moments of the matrices (f2]i 
depend only on the spectra of the input matrices. This gives 
the following result for the multiplicative convolution in 1): 

Theorem 2: Assume that V has a phase distribution with 
continuous density, 

V n = lim tr((V H V) n ) (13) 
D n = c lim tr(D(iV)") (14) 

M n = c lim tr ((D(7V)V H V)") , (15) 



Mi = Dx + 1 

M 2 = D 2 + 2D X + V 2 

M 3 = D 3 + 3D 2 + + V 3 

M 4 = D 4 + 4D 3 + 2D 2 + AD 2 V 2 

-2D\ + 2D\V 2 + 4D t Va + V 4 

where all coefficients are rational numbers. Also, whenever 

{M n }i<„< fc are known, and {V n }i< n <k(oi {D n }i< n < k ) also 
are known, then {-D„}i<„<& (or {y„}i<„< fe ) are uniquely 
determined. 

Restricting to uniform phase distribution we get another 
specialized result: 

Corollary 2: When V has uniform phase distribution, we 
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have that 

Mi = Di + 1 

M 2 = D 2 + 2D 1 + 2 

M 3 = D 3 + 3D 2 + 6.D1 + 5 

M 4 = L> 4 + 4L> 3 + 10D a + 2L> 2 + 20L>i + 

fi. 77ie convolutions limTv-^oo V^ViV^V^ and 
Hm^^CVfVi+VfVa) 

The following result says that the convolution 3) only 
depends on the spectra of the input matrices: 

Theorem 4: Assume that Vi and V2 are independent Van- 
dermonde matrices where the phase distributions have contin- 
uous densities, and set 

= „'^ t '(( v " v ')") 

V<»> . „lM.r((V ? V 2 )") 
M„ = lim tr((VfViVfV a )") (16) 

N— ► oo 

N n = lim tr ((Vf V x + Vf V 2 ) n ) (17) 

M n ,N n are completely determined by V 2 \ , V3 , and 
the aspect ratios ci = limjvi-«x3 j^-, c 2 = lirri^-^co jfe. 
Moreover, M n , N n are higher degree polynomials in the 
V2 , V3 , ... on the form ( fTUb . Also, whenever {M n }x<n<k 
(or {N n }i< n < k ) are known, and {v} n) }i< n < k also are 
known, then {V^ }i< n <fc are uniquely determined. 

The proof can be found in Appendix [B] Due to the com- 
plexity in the expressions , we do not state formulas for the 
first moments in Theorem [4] 

Interestingly, since the joint distribution of {V H V, D(/V)} 
is not multi-linear in the moments of D(iV), while the joint 
distribution of {V^Vi, V| f V 2 } is, it is seen that the joint 
distributions are different in the two cases, even if the moments 
of the component matrices are the same. 



C. The convolution liirijv— >oc ViV^ V 2 V~2 when the matri- 
ces have equal phase distribution 

When the phase distributions are different, Theorem Q] ex- 
plains that the moments of V\ Vf r V 2 V| f are not necessarily 
expressible in terms of the moments of the component matri- 
ces. This is, however, the case when the phase distributions 
are equal. We thus have the following result, which proof can 
be found in Appendix |B| 

Theorem 5: Assume that Vi and V 2 are independent Van- 
dermonde matrices with the same phase distribution, and that 
this has a continuous density, and set 

V n = Km tr((VfV 4 )") 
M n = lim tr((VfV 2 VfV 1 ) i ). 



Then we have that 

Mi = -I + V2 

M 2 = -3 + 6V 2 - W 3 + Vi 

M 3 = -58 + 123F 2 - 96V3 + 3914 - 9V 5 + V 6 

21532 410726 321191 44516 

-7721/5 + 136\/ 6 - I6I/7 + V 8 

Restricting to uniform phase distribution we get another 
specialized result: 

Corollary 3: When V\ and V 2 have uniform phase distri- 
bution, we have that 



Mi 


= 1 


M 2 


= 2 


Ms 


= 5 




44 


Mi 






~3~ 



D. TheconvolutionliuiN^ (v^ (v£V) + v£ 2) (V£ 2) ) 

V H V can be viewed as the sample covariance matrix of the 
random vector (l,e -3 ' a ', e~^ N ~^ u ). A similar interpreta- 
tion of the convolution (vl\ ] (vjft)* + V$ (v^)^ is 

thus as a sample covariance matrix of a random vector of the 
same type, but where the phase distribution is U\ parts of the 
time, and uj 2 the rest of the time. This convolution does not 
satisfy the requirement a > [0, l] n from TheoremQ] so there is 
no guarantee that the result only depends on the spectra of the 
input matrices. It will be apparent from Theorem [6] below that 
the dependence is, indeed, on more than just these spectra: 
Knowledge about the phase distributions is also required, 
and we will in fact interpret this convolution instead as an 
operation on phase distributions. 

Consider first two independent Vandermonde matrices 
Vo^ci, "Vu}c 2 with an equal number of rows N and with 
a common phase distribution lu. By stacking v£, Cl , V^ 2 c 2 
horizontally into one larger matrix, it is straightforward to 
show that the distribution of 

V« (yW )" (V ( V ) H (18) 

equals that of V W:Cl+C2 V^ Cl+C2 . This case when the phase 
distributions are equal is therefore trivial. 

When V"iV jC , V^c are independent with the same number 
of rows, but with different phase distributions, computing the 
distribution of 

V« (V 1 ' ) H +V< 2 ) (V 2 > ) H (19) 

seems, however, to be more complex. The following result 
explains that, at least in the limit, the situation is simpler. There 
the sum can be replaced by another Vandermonde matrix, 
whose phase distribution can be constructed in a particular 
way from the original ones: 

Theorem 6: Let V Wl C1 and V W2 C2 be independent N x L\, 
N x L 2 random Vandermonde matrices with phase distri- 
butions lo\, u>2, respectively, and with aspect ratios c\ = 
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distribution of 



Urn 



JY- 



*oo ^fi respectively. Then the limit 



moments vanish). These moments are given by 



V V H 

v Wl.Cl v W1,C1 



V V H 



equals that of 



(20) 



(21) 



where Wi * cliC2 denotes the phase distribution with density 
3j+^-( c iPwi + C2P^ 2 ), where p u)ll p UJ2 are the densities of the 
phase distributions ojx,u>2- 

The proof of Theorem [6] can be found in Appendix [C] The 
result is only asymptotic, meaning that the mean eigenvalue 
distribution for finite N of the two mentioned matrices are in 
fact different. This can be seen by setting L = N = 2, and 
observing that the distribution of | (e JW1 + e J " 2 ) is in general 
different from that of e 1 ^ 1 * 11 " 2 . No trivial proof for Theorem|6] 
is thus known, since the strategy of stacking the Vandermonde 
matrices (from the reasoning for ( TT8l >) will not work. 

Theorem [6] says that one depends on knowledge about the 
phase distributions for Convolution 4). To verify this, set Wi 
and u! 2 equal to the uniform distributions on [0, ir), and then 
change u>% to the uniform distribution on [ir, 2tt). The phase 
distributions here give the same moments (since they are 
shifted versions). However, the two versions of \{p u]1 + Pu> 2 ) 
give phase distributions with different moments, since we get 
the uniform distribution on [0, ir) in the first case, and the uni- 
form distribution on [0, 2ir) in the second case: the moments 
of these are different, since the uniform distribution on [0, 2tt) 
minimizes the moments of Vandermonde matrices [20|. For 
the same reason, Theorem[6]says that the moments of (f20b are 
minimized when u>\ * Cl . C2 ^2 equals the uniform distribution. 



E. Hankel and Toeplitz matrices 

ll20l states that the moments of V ff V can be expressed in 
terms of volumes of certain convex polytopes. It turns out that 
the moments of Hankel, Markov and Toeplitz matrices can be 
expressed in terms of a subset of these polytopes ||29l , so that 
we can use the same strategy to compute the moments of these 
matrices also. The proof of the following theorem relating to 
the moments of Toeplitz matrices is therefore explained in 
Appendix [B] 

Theorem 7: Define the Toeplitz matrix 



X 


Xi 


x 2 ■ 


X n -2 X n -i 


x l 


Xo 


Xi 


X n -2 


x 2 


Xi 


X 





X n -2 

\ X n -i 



X n -Q 



X? 



X 
Xi 



\ 



x 2 

Xi 
Xo J 



where Xi are i.i.d., real-valued random variables with variance 
1. Let Mi be the 2i'th asymptotic moment of T„ (the odd 



Mi 
M 2 
M 3 
Mi 



= 1 



3 

11 

1435 
24 



A similar result for Hankel matrices also holds: 
Theorem 8: Define the Hankel matrix 



H, 



x 2 



X n -2 
X n -1 

V x n 



x 2 
x 3 



X n -1 

x n 

X n +i 



X n -i 

X n 



X n 
X n +1 



X n 



+ 1 



X 



n+2 



X 2n -3 
X 2n -2 



X 2n -2 
X2n-1- 



where Xi are i.i.d., real-valued random variables with variance 
1. Let Mi be the 2z'th asymptotic moment of H n (the odd 
moments vanish). These moments are given by 



Mi = 

M 2 = 

M 3 = 

Mi = 



1 

8 

3 

14 
100 



Similar results can also be written down for Markov 
matrices, but these expressions are skipped. It seems that 
expressions for the joint distribution of Hankel and Toeplitz 
matrices and matrices D(7V) on the same form as before do 
not exist, meaning that the mixed moments may not exist, or 
that they depend on more than the spectra of the component 
matrices. The details of this are also skipped. 

F. Generalizations to almost sure convergence 

Up to now, we have only shown convergence in distribution 
for the different convolutions and mixed moments. The same 
results also hold when we replace convergence in distribution 
with almost sure convergence in distribution. We summarize 
this in the following result: 

Theorem 9: Assume that the matrices T)i(N) have a joint 
limit distribution as N — > 00, and that Vi, V2, ... are indepen- 
dent, with continuous phase distributions. Any combination of 
matrices on the form (0 converges almost surely in distribu- 
tion, whenever the matrix product is well-defined and square. 

The proof of Theorem [9] can be found in Appendix [D] In 
particular, the matrices we have considered in our convolution 
operations, such as Vf ViVf V 2 , Vf V\ + Vf V 2 , all 
converge almost surely in distribution. 

G. Generalized Vandermonde matrices 

We have not considered generalized Vandermonde matrices 
up to now, i.e. matrices were the columns in V are not uniform 
distributions of powers [30|, [20|. Although similar results can 
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also be stated for these matrices, we only explain how they 
will differ. 

In case of uniform power distribution, the column sum of 
£0 is 



and this is substituted into the integrand of the expression 
defining the Vandermonde mixed moment expansion coeffi- 
cients (see Appendix|A}. For generalized Vandermonde matri- 
ces, one can also define these coefficients [20], the difference 
being that one replaces the sum of the powers ( 1221 with 
a different function, and requires that the function has the 
property proved in Lemma [2] in Appendix [A] The details for 
computing the mixed moments (0 go otherwise the same way 
as the expressions in Appendix [A] with the exception that 
we have different values for the Vandermonde mixed moment 
expansion coefficients. However, the integrals defining these 
coefficients may be hard to compute for a non-uniform power 
distribution, even for the case of uniform phase distribution, 
since Fourier-Motzkin elimination (see Section HVb can be 
applied only in the case of uniform power- and phase dis- 
tribution. 

We conjecture that Theorem [6] holds also for general power 
distributions. It is likely that a similar calculation as in 
Appendix [C] can prove this, but we do not go into details 
on this. 

IV. Software implementation 

In this section, we will repeatedly refer to the implementa- 
tion 1 26 1, which contains all code needed to verify all results 
in this paper. Implementations therein have two purposes: 

1) to generate the exact coefficients in the formulas in this 
paper (generated directly in latex), 

2) to compute the convolution with a given set of moments 
numerically. 

In Appendix [A] we explain why iteration through partitions 
and Fourier-Motzkin elimination are two main things needed 
in the implementation. In this section, we will explain how 
these tasks can be implemented efficiently. 

A. Reducing the complexity in iterating over partitions 

Formulas in |20l and in this paper sum over sets of 
partitions. Iterating over partitions is very time-consuming, and 
must therefore be performed efficiently. There are several ways 
how this can be performecQ. It turns out that one can reduce the 
number of partitions needed for computations considerably. 

Assume that V has uniform phase distribution, and consider 

tr (V H V ■ ■ ■ V H V) . (23) 

To compute ( f23l . we traverse all partitions. For each partition 
an equation system is constructed, and the partition contributes 
with the volume of the corresponding solution set to the 
equation system in ( 1231 . The following observations [31 ], [20 1 
simplifies this computation: 

'The implementation in this paper uses an implementation 1261 which lists 
all partitions of n elements with a given number of blocks 



• If a block is a singleton, then the corresponding volume is 
the same as that of the partition with that block removed. 
By using this observation repeatedly, we obtain that any 
noncrossing partition gives 1 in volume contribution. 

• If a block contains two successive elements, then the 
corresponding volume is the same as that of the partition 
with any one of the two elements removed 

• If a partition is a cyclic shift of another, then the corre- 
sponding volumes are the same. 

These observations can reduce the number of the computations 
dramatically. To make precise how these observations can be 
used, we state two definitions: 

Definition 4: A partition ir is said to be alternating if i 
and i + 1 (where the sum is taken cyclically mod n) are in 
different blocks for all i, and no blocks in ir are singletons. The 
alternating partition obtained by removing all singleton blocks 
and all successive elements in all blocks incrementally is 
called the standard form of the partition. The set of alternating 
partitions of {1, ...,n} with k blocks is denoted A(n, k). 

Definition 5: We say that two partitions are equivalent 
whenever one is a cyclic shift (with a fixed number of 
elements) of the other. 

Note that the standard form of any noncrossing partitions 
is the empty partition. The first two observations above say 
that computations only need to be performed for alternating 
partitions (since any partition can be reduced to an alternating 
one in standard form), while the third observation says that 
computations are only needed for one representative in each 
equivalence class, with equivalence defined as in Definition [5] 
For instance, there are 678570 partitions of {1, 11}. The 
number of alternating partitions of the same set is 4427. The 
number of equivalence classes of alternating partitions is 715. 

The moments of Vandermonde matrices can thus be com- 
puted by iterating over the smaller set of cyclic equivalence 
classes of alternating partitions. This iteration can be accom- 
plished with a computer program^. We also need to keep 
track of the size of each equivalence class of alternating 
partitions. This is done by a program which efficiently hashes 
all partitions. This is a computationally intensive process, but 
which needs to be done only once for the required number of 
moments. 



B. Constructing linear equation systems 

For a partition p = {W l7 ...,W r } E T(n), GUI relates 
( l23l to the corresponding volume of the solution set of the 



2 It is not obvious how the observations can be applied in an efficient 
implementation. The implementation [26:] first generates all partitions, and 
then picks out those which have the alternating property and no singleton 
blocks 
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equations 



feeWi 

Xk-i 

kt£W 2 



keWi 
kew 2 



^ Xk-i = a^fe, 
teff, kew r 



(24) 



where all variables are constrained to lie between and 1. 
p reflects how the w,; are grouped into independent sets of 
variables: The left sides in d24i > represent the V H -terms in the 
entries of the matrix product V^V, whereas the right sides 
represent the V-terms in the same matrix product. Equations 
of the form d24T i also apply to the more general form 



H 



tr(V n V n 



•v*v£) 



(25) 



where Vi, V2, • 
distribution. 



are independent and with uniform phase 



In Appendix [A] it is shown that in order to express the 
arbitrary mixed moments of Vandermonde matrices (indepen- 
dent or not), we need to solve systems similar to (I24t . with 
the difference that different number of variables may appear 
on the left and right hand sides. Note that the volume of 
the solution set of d24T i is always a rational number. This 
enables our implementation to generate exact formulas. For 
Toeplitz and Hankel matrices, it turns out that a subset of these 
equation systems serve the same role in order to compute their 
moments. 



C. Solving the linear equation systems 



In all cases of Toeplitz, Hankel, and Vandermonde, the 
coefficient matrix of the equations we construct has rank r — 1 
(r being the number of blocks), and we need to find the 
number of solutions. Since we also have the constraints that 
< Xi < 1, this really corresponds to finding all solutions to a 
set of linear inequalities. A much preferred method for doing 
so is Fourier-Motzkin elimination ll28l . The first step before 
we perform this elimination would be to bring the equations 
into a standard form. We do this by expressing the r — 1 pivot 
variables (after row reduction) by means of the free variables. 
Since all variables are between and 1 (which are split into 



En— r+1 
7=1 a lj X j 



two inequalities), our equations are 

\ 

En — r 
j=X ~02jXj 



•\n — r+1 

-r- 

vn. — r+1 
- '3= 



j = l ^3 X 3 

En— r+1 
,=i a 2j Xj 



En — r+1 
j=X a (r-l)j x j 

L^j=X ~ a (r-X)j x j 
X\ 

-Xi 
Xi 

-Xi 



Xn — r+1 
~Xn— r+1 



< 
< 
< 
< 

< 
< 
< 
< 
< 
< 

< 
< 



1 



1 



1 



1 



1 



1 

0, 



(26) 



where we have re-indexed the variables so that xx, ...,x n - r +i 
are the free variables, x„_ r+ 2, are the pivot variables. 

The coefficients ay are taken from —1,0,1, and are the 
coefficients we obtain when the pivot variables are expressed 
in terms of the free variables. By reordering the equations, we 
get what we call the standard form (where the equations are 
sorted by the first coefficient): 



xi 



x\ 



ei 



+ YTj=l h r 1 jX j+ x 

En—r 
3 = 1 CXjXj+1 



En—r 
j=X c r 2 j x j+l 



< 
< 

< 
< 

< 



(27) 



fl'2 
fll 



-Xx +YTj=X d r 3 jXj + x < 9r 

Fourier-Motzkin elimination now consists of eliminating the 
first variable, and working on the remaining equations to 
eliminate variables iteratively. Most of the coefficient matrices 
here are combinatorial matrices on the same form as those 
in 11281 . 

Fourier-Motzkin elimination is computationally intensive, in 
the sense that the number of inequalities grow rapidly during 
elimination. Our aim is to compute the volume of the solution 
set rather than finding specific solutions. The volume can be 
split into many smaller disjoint parts, each part corresponds 
to a choice of minimum (min) for the first equations, and a 
choice of maximum (max) for the last equations Each part 
corresponds to the solution of a set of equations with one 
less variable. More precisely, let the equations in d27| i have 
coefficient vectors B\, B ri , Ci, C r2 , Dx, . 



, D r3 , so that 



B t = 
Ci = 
D, = 



(l.&ii, 
(0, (Hx, 
{-l,<kx, ■ 



1 Q(n— r) j ft) 



H(n- 



),9i) 



l<i<rx 
l<i<r 2 
l<i<r 3 . 



Each choice of min, max, with 1 < min < n, 1 < max < r 3 
gives rise to a volume described by the solution to the set of 
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equations 

.Bfc — B m j_ n 1 < k < r\ k =/= min 

D k - Anax 1 < k < r 3 k^ max 

-Dmax + -Bmin 

C k l<k<r 2 , 

where the equations are described by row vectors as above. 
There are r x — 1 + r 3 — 1 + 1 + r 2 = r\ + r 2 + r 3 — 1 
equations here, which is one less equation than what we started 
with. Note that the first element is zero in all these equations, 
so that the first column can be removed in the coefficient 
matrix. Therefore, the original system has been reduced to 
one with one equation less and one less variable. There may 
be more zero leading columns also, and all these can be 
removed. When the leading column is nonzero, the rows are 
sorted so that we get a new system on the form d27l >. and the 
procedure continues. In the process, the choice of max and 
min have decided the lower and upper integral bounds for the 
xi -variable. These are stored, and after all Fourier-Motzkin 
elimination we have a full set of integral bounds, and the 
corresponding volume is computed by integrating over these 
bounds. This can be implemented easily |26l . since integration 
over a volume with integral bounds which are linear in the 
variables can be defined in terms of simple row operations 
and integration by parts. 

D. Optimizations for Fourier-Motzkin elimination 

The challenge in computing the volumes of the solution 
sets in Fourier-Motzkin elimination lies in that there are many 
eliminations which need to be peformed, and we do this for 
every partition in a large set of partitions. The Fourier-Motzkin 
elimination steps themselves can be stored and reused, but this 
of little help since we have to keep track of the corresponding 
integral bounds for the solution sets. There are however, a 
couple of optimizations which can be used during elimination: 

• if both row and the negative of that row are present as an 
equation, the solution set is empty, so that we can stop 
elimination 

• Duplicate rows can be deleted 

• Rows where only the last elements differ can be merged. 

V. Simulations 

Results in this paper have been concerned with finding the 
spectral limit distribution from those of the input matrices. 
However, in practice, one has a certain model where one or 
more parameters are unknown, one observes output from that 
model, and would like to infer on the parameters of the model. 
The strengths in the results of this paper lie in that this kind of 
"deconvolution" is made possible to infer on the parameters 
of various models. As an example, 

1) From observations of the form T)(N)V H V or D(iV) + 
V^V, one can infer on either the spectrum of D(N), or 
the spectrum or phase distribution of V, when exactly 
one of these is unknown. 

2) From observations of the form Vf^ViVl^V^ or 

Vi + V2, one can infer on the spectrum or phase 





Second moment of D 

x Estimated second moment of D 
Third moment of D 

+ Estimated third moment of D 





1.4- + + + + +- 
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N 



Fig. 1. Estimation of the second and third moment of D(A r ) from the 
average of 10 observations of the form D(7V)V H V, for increasing values 
of N. V has dimensions N X N. 

distribution of one of the Vandermonde matrices, when 
one of the Vandermonde matrices is known. 
Moreover, the complexity in this inference is dictated by the 
number of moments considered. We do not go into depths 
on all the different types of deconvolutions made possible, 
only sketch a very simple example of inference as in 1). 
The other types of deconvolution go similarly, since the 
implementation supports each of them through functions with 
similar signatures. The example only makes an estimate of the 
first lower order moments of the component matrix ~D(N). 
These moments can give valuable information: in cases where 
it is known that there are few distinct eigenvalues, and the 
multiplicities are known, only some lower order moments are 
needed in order to get an estimate of these eigenvalues. We 
remark that this kind of deconvolution can be improved by 
further development of a second order theory for Vandermonde 
matrices. 

In Figure Q] we have, for Vandermonde matrices of size 
N x L with L = N, and for increasing iV, formed 10 obser- 
vations of the form D(N)V H V. The average of the moments 
of these observations are then taken, and a method in the 
framework 11261 is applied to get an estimate of the moments 
of D(iV). In the simulation, we have compared the estimate 
for the second and third moment of D(7V) obtained by the 
implementation, with the actual second and third moments. 
The diagonal matrix D(iV) is chosen so that the distribution 
of its eigenvalues is ^<5o.5 + 5^1 + §£i.5> i- e - 0.5, 1, 1.5 are 
the only eigenvalues, and they have equal probability. The 
simulation seems to indicate that the implementation performs 
better estimation when the matrices grow large, in accordance 
with the fact that only an asymptotic result is applied. 

Although it is difficult to make a full picture of the spectral 
distribution of V^Vi (or the phase distribution of Vi) from 
deconvolution on models such as Vf r V 1 V 2 Cf V 2 (although 
the moments in many cases determine the distribution of the 
eigenvalues J32]), such deconvolution can still be useful. For 
instance, from the lower order moments one can to a certain 
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amount say "how far away Vi is from having uniform phase 
distribution", since the uniform phase distribution achieves the 
lowest moments of all Vandermonde matrices ||20l . 

VI. Conclusion and further directions 

This contribution has explained how all types of moments 
in Vandermonde-type expressions can be obtained, and when 
one can expect that the moments/spectrum of the result only 
depend on the moments/spectrum of the input matrices (which 
is a requirement for performing deconvolution). The results 
can be used to compute the moments of any singular law 
involving a combination of many independent matrices. An 
implementation which is capable of performing these moment 
computations is also presented, and moment formulas gener- 
ated by the implementation were presented. The applications 
to wireless communications are still under study 1331 . We 
have also described convolution operations on Vandermonde 
matrices which can not be performed in terms of the spectrum, 
but rather in terms of the phase distributions. We have also 
expanded known results on convergence of Vandermonde 
matrices to almost sure convergence. 

Interestingly, Vandermonde matrices fit into a framework 
similar to that of freeness. Future papers will address a unified 
framework, where a more general theory which addresses 
when deconvolution is possible is presented. 

It is still an open problem to find exact formulas for 
any moment of a Vandermonde matrix. The same applies to 
identifying these moments as the moments of a certain density. 
Future papers may also address how the implementation 
presented here can be made more efficient. 

Appendix A 
The proof of TheoremQ] 

Let us first assume that all phase distributions are uniform. 
Writing out the matrix product in (0 we get 

E E 

(il,...,i„) (jl,--.,jn) 

xE(e' 2 ' 1 '° l|1|j i" 1J ' l(2)j ' 2 ' x--- 

xe H(w<r 1 (2n-l),J n -W CT1 (2n),3 1 )) 

xDiCJVJCji,^) x ••■ x D„(A0(j„,j n ),(28) 

where 

1) l<Ji,...,in<i(asmES) 

2) < i\, ...,i n < Ni — 1 for appropriate I (as in [20 1), 

3) <7i = {(Tii, ...,011^1} with aik = = k}, 

4) is the phase for column ji in the i'th matrix 
entry. 

Define the partition it = tt(Ji, ■■■ 1 j n ) € 3 3 ("-) by equality of 
the ji, i.e. k ~ w I if and only if j k = j h Noting that u ai (k) ,j k , 
LOaUftji are equal if and only if <Ji(k) — <Ji{l) and jk = ji 
(if not they are independent), we define p(ir) < o\ S CP(2n) 
as the partition in CP(n) generated by the relations: 

f [k/2\ + 1 ~ w [l/2\ + 1 and 



Here \x\ means the largest whole number less than x. In other 
words, k and I are in the same block of p(tt) if and only if 
the corresponding phases oj ai ^j k and uj (Ji ^j 1 from the fc'th 
and Tth matrix entries are dependent. We will have use for 
the following relation between p(ir) and tt, which will help us 
to limit our calculations to a certain class of partitions. 
Lemma 1: The following holds: 

H < |p(7r)| -r(7r) + 1, (29) 

Moreover, both equality and strict inequality can occur in ( f29b . 

Proof: Since each block in 7r is associated with at least 
one block in p(ir) by definition, we have that \tt\ < \p(ir)\. 
Moreover, if p\ is adjacent to p2, they have a j-value com- 
mon at their border, so that \ir\ < \p(n)\ — 1. If ps is 
adjacent to {pi,p2}, they also have a j-value common at 
their border, so that also \w\ < \p(n)\ — 2. We can continue 
in this way for p±, p$, p r , and we obtain in the end that 
| tt | < \p(tt) \ — r(7r) + 1. It is also clear from this construction, 
by considering different border possibilities for the p\,P2, 
that both equality and strict inequality can occur. ■ 
In the following we will denote the set of partitions where 
( |29l > holds by 55 (n) (note that 23 (n) will also depend on 
o"i, but this dependency will be implicitly assumed, and will 
thus not be mentioned in the following). Writing p(jf) = 
{Wi, ...jWipWi}, there are |p(7r)| independent phases in the 
corresponding term, which we denote uiwx , ^w^ { „^ ■ Write 
Wj = W'a U W? , where W'» consists of the even elements of 
Wj (corresponding to the V-terms), W" consists of the odd 
elements of Wj (corresponding to the V H -terms). (0 can now 
be written (computations are similar to Appendix 1 in ||20l ) 

E E E 

TreV(n) (ii in) Wi.---.Jn) 



1=1 

\p{ 

X 



-kid/2 L -i 



r=l ^ ' 



xDi(JV)(?l,jl) X ■•• X D n (JV)0'n,in). 



(30) 



Since E (e jriw ) = when ui is uniform and n ^ 0, we get 
that the ii, i n contribute in ( f30b only if 

E *(«!+l)/2+l = E i k/2+l (31) 

fcew^ kew r 

for 1 < r < |p(7r)|. The coefficient matrix of this system, 
denoted A, is a \p(tt)\ x n with entries from {—1,0, 1}. The 
rank of A is at most k — 1, since the sum of all rows is 0. 
Note that the number of solutions to OTT l can also be written 



I 



[0,2tt)IpM 



F(x)dx 1 ■ • •dx| p(w) |, 



where 



-A- 1 - e jNi 2k( X H2k-l)~X b(2k) ) 
F(X) = TT -/ r-, 



k=l 



(32) 



(33) 



where b(k) means the block in p(n) which k belongs to. This 
follows (as in [20|) from summing over all possible choices 
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of ii,...,i n in ( 1301 ). and using the formula for the sum of a 
geometric series. 

It is easily seen that the rank of A is exactly k — 1 when 
p(7r) V [0,1]„ = 1 2 „. More generally, if ^tt) V [0, 1]„ = 
{p%, p r } with each pi > [0, lLuji /2, the rank of the system 
is | yo(7r) | — r. This follows since the equations corresponding 
to each pi have no variables in common with those from other 
Pj, and since the sum of the equations corresponding to pi is 
0, so that the coefficient matrix corresponding to pi has one 
less than full rank. Also, note that Ni s+2 — Ni t+2 whenever 
2s + 1, 2s + 2, 2i + 1, 2t + 2 all belong to the same such p i7 
and denote this common value by N pi (meaning that there 
is a common upper limit N pi to all variables i[ occurring 
in connection with the same block p^. This means that the 
number of solutions to ( f3Tb is of order 



Pi j|/2-| Pi | + l 



= orQi>ii/ 2 -ip*i+i =o(L n -\p\ +T 



Since r depends only on tt, we will also write r — r(n). Since 
the number of solutions to (l3~TT l is given by (l32l l. the limit 



lim 

L — >oo 



1 



n 



l|Pi||/2-|Pi| + l 



[0,2tt)Ip(")I 



F(x)dxi 



■ dx 



pWI 
(34) 

exists (here u denotes the uniform distribution), and we 
will denote this limit by K p f~) tU - Moreover K p /„\ u = 
lTt=i K p t ,u, since the splitting p(w) V [0, 1]„ = {p u p r } 
actually splits the equations into r sets where each set has no 
variables in common with other sets. This definition extends 
that of Vandermonde mixed moment expansion coefficients 
from 1 20 1 to the case where the equations (|3~TT > may have an 
unequal number of variables on each side. Note that in |20|, 
those coefficients were defined in terms of tt S V(n), while 
here they are defined in terms of p(tt), which captures any a\, 
which is new in the analysis given here. Also in accordance 
with [20|, we will denote by K p ^ u ^ the quantity inside the 
limit of ( 134) , so that the number of solutions to ( f3TT > is 



n 

i=l 



pi ±v p{tt),u,L- 



(35) 



The number of blocks in the partitions p(tt),tt say how 
many distinct choices from (ii, i n ) and (ji, j n ), respec- 
tively, contribute in (1301 . By substituting 

^D i (iV)(j fe ,j )t ) = Ltr(D(iV)), 
and using (1291 , we see that d30b is 

Q^-n-l+n-\p(Tr)\+r+\ir\^ 

with equality if and only if tt E 53(ri) by Lemma Q] To check 
if 7r belongs to 23 (n), p(n) needs to be computed, and it is 
checked if equality in (|29l holds. If so, the corresponding 



equation system d3TT > is constructed, and solved using Fourier- 
Motzkin elimination. Adding contributions for all partitions, 
we obtain (|9). 

For 7r € T5{n), noting that we can write n'=i ' CTl! '^ 2 = 
nr = i A r Pi " p '"^ 2 , and using (1351 , we can write the contribution 
from tt in <f3Qb as 

^ Arp -||Pdl/2 L -i L klT^^l r ll/ 2 -(|p ! |-i) KpWu ^ 

r 

= l \pM\-tM -Q N -(\Pi\-i) KpM ^ LD „ 

= f[L^f[N-^-VK pMtU , L D„ 

i=i i=i 

Thus, if uj is uniform, taking limits in ( f30] > gives 
r / L \ lpil ^ 1 

En f K P{ ^ UtL D, 



7rSS(n) i=l 



^ E IW^pm,^ 

r 

= e n( c p? M ^)^< 

7reS(n) i=l 

where we have substituted c Pi — limL~too jt-- When the cj, 
are not uniform, we can still in (l30b sum over the different 
ii, ...,i n to factor out the term 



/ 

J\o. 



F(u)doJi ■ ■ ■ dwip^M, (36) 

where F is defined by d33t . and where the only difference 
from d32b is that the uniform distribution u has been replaced 
with uj. The analysis is otherwise the same as in the uniform 
case, the major issue being the existence of the limits 

' F{u)dwi ■ ■ -eL;| pW |, 



lim 



L — >oc 



ni =1 < 



||«||/2-| Pi |+l 



[0,2tt)Ip(')I 



(37) 



which thus also will be called Vandermonde mixed moment 
expansion coefficients, and denoted Kp^y^. As in the uniform 
case, note that Kp^)^ = Yil=i Kpi,u> an< ^ ^ these limits exist, 
we get as in the uniform case a limit on the form 

E ri(4^%-K 

In |20l . it was shown that the limits K^^ exist when oj has 
a continuous density. We will show that the same holds for 
K p (~\ u , and the proof of this will follow from the following 
lemma, which is a generalization of Lemma 2 in |20l : 

Lemma 2: Let p(ir) < ax € T(2n) be any partition such 
that p(tt) V [0, 1]„ = 1 2 „. For any e > 0, 

1 



lim - , 

AT^oo ]\Tn+l-|p(7r) 



F(uj)dLU = 0, 



(38) 



B r _, 
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where 

B e , k = {(wi, -,u\p(Tr)\)\\w b (?k-i) - ^b(2fc)l > £ } ; (39) 

and where b(k) denotes the block in p(ir) which k belongs to. 

Proof: The condition p(ir) V [0, 1]„ = l2« implies that 
when ui G -B e ,fc, — ujj < 2ne for all i, j, which means that 
the definition of B e ( & is similar to the definition of B t r in 
Lemma 2 in Appendix H in [20|. The proof otherwise follows 
the same lines as (20]. ■ 
Using Lemma |2] repeatedly, we see also that 

lim — , 1 [ F{uj)duj = 



Letting N — > oo, we obtain as in Appendix H of ll20l in the 
limit 

' F(uj)duj 



from the moments Vn ■ Since there are r integrals multiplied 
together in d40i i, its general form is seen to coincide with that 
of ([Tol l. We have thus proved Theorem [TJ When terms of the 
form V^V Wa occur, Section ITlI-DI shows that we can't expect 
dependence on only the moments. Instead the mixed moment 
depends on the entire phase distribution. 

A. Handling the aspect ratio 

We will finally comment on appropriate forms of ( f40l > which 
are useful in implementations of convolution. We first turn 
to the case when there are deterministic matrices present. 
Assume that all matrix aspect ratios are equal to c, so that 

nu(^ hl ) = < 

m n = cM n and d, n 
also be written 



lim - 



Lemma 3: Let V be a Vandermonde matrix with phase 
distribution uj and aspect ratio c. For each n there exists an 
invertible n x n matrix A n so that 

/ 1 



c 2 I: 



3.tu 



( i \ 



i 

v 3 



(41) 



Inserting fiTT i in (|40l when each pi consists of equal phase 
distributions, we obtain that (O is completely determined 



M-r = c |tt|-i when n g %( n y Defining 
= cD n as in ll20l ) in this case, ( |40T > can 



i=l j 

where <jj are the blocks of a. Things have now been reduced 
to the case of uniform phase distribution. In summary, (|30b 
can be written 

]T D n f[ (i2-KC Pi )\"^K PitU ) n / Y[p Uj (x)\»^dx, 

7reS(n) i=l »=1 i 

(40) 

This is the standard form which the implementation uses, 
where the output of Fourier-Motzkin elimination is substituted 
into Kpt v \ u , In d40| i, we recognize the integrals Jfe )W in ( fl2l i. 
We will therefore substitute 1).^ in the following. 

The requirement from Theorem Q] that a > [0, l] n (which 
happens whenever terms of the form V W2 (with u>i , LO2 
different and V Wl , V W2 independent) do not occur) translates 
to the fact that, for any it E CP(n), in all pi the corresponding 
random matrices have equal phase distributions (with no 
assumptions on whether the random matrices are independent 
or not). From this it follows from (l40l > that no integrand 
in ( |40l will contain two different densities. Therefore, the 
mixed moment is completely determined from the integrals 
Ik,u- To make the connection between these quantities and the 
moments, we need the following lemma, compiled from 



E 

7reS(n) 



UK p{7r)iU f[ J l[p Ui (x)^ n ^dx, (42) 



providing a 



i.e. the aspect ratio c can be handled as in 
clear parallel with Proposition 3 in that paper. 

When there are no deterministic matrices present, and a > 
[0, 1]„, the right hand side in (|40b is 



IT 

i=l 



\Pi 



(43) 



where we recognize the elements in the vector on the left 
hand side in d4"TT i. Therefore, an implementation of convolution 
would first compute the c fc_1 /fc iWi using Lemma[3] and substi- 
tute these directly into j43l . For deconvolution, ( |43l would be 
computed first for one of the unknown component matrices, 
and then the moments would be recovered in a second step 
using Lemma [3] 

In summary, in order to compute the mixed moments (O of 
Vandermonde matrices, we need to 

1) iterate through partitions tt 6 T(n), compute p{n), and 
determine whether tt E 'Bin), 

2) perform Fourier-Motzkin elimination in order to solve 
the set of equations given by (fJTJl, 

3) compute the quantities in (|43l , either from direct knowl- 
edge of the phase distribution, or by computing them 
from the moments using ( |4T1 >. 

4) compute the final result by inserting the results from 1), 
2), and 3) into (l40l 

This explains why Section [IV] focuses on the implementation 
perspectives of these tasks. 

Appendix B 
The proofs of the convolution formulas 

In this appendix, we provide additional remarks, which 
together with the proof in Appendix [A] will suffice to prove 
the different convolution formulas. We first provide a short 
explanation why convolution 2) does not depend only on 
the spectra of the component matrices. When Dj(iV) only 
occurs in patterns of the form VD(iV)V H , we factored out 
the moments of D(N) in (f30b in Appendix [A] For other 
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patterns, one ends instead up with integral expressions along 
the diagonal of D(iV) (the diagonal elements of D(iV) are 
multiplied with different complex exponentials), which are 
hard to express in terms of the moments of D(Af). 



more mixed moments). This explains how the implementation 
computes the formulas for Theorem [3] 

Deconvolution for Theorem [3] follows the exact same argu- 
ment as for Theorem [2] 



A. The proof of Theorem \2\ 

We can sum over all tt in (ffOi i for these convolutions, since 
r(w) = 1 and \p(n)\ — \tt\ for all tt whenever a — o\ = l^n- 
The implementation has obtained the result by inserting (|4iT i 
into ( |40] >. in ( |40b can be handled in the following way: 
Let R n be the set of multi-indices r = (rx, r s ) such that 

> The ri are decreasing, and all are integers > 0. 

. r i = n > 

and set d r = \~[i=i d Ti for r = (rx, r s ) € R n - Set also 



(di) 

(d z ,d 2 di,d\) 
(d4,,d%,d 3 di,d 2 dl,di) 
(d 5 ,d 3 d 2 ,d 4 d 1 ,dld 1 ,d 3 dl, d 2 d\, d\), 

and so on. It is clear from Appendix [A] that we can find a 
vector K n such that 

/ Mi \ / 1 \ 



Dx 

D, 
D 3 
Di 
D 5 



M 2 
M 3 



V M n J 



— D„ K n A r - 



V-2 
V 3 

V Vn J 



(44) 



Moreover, the matrices K n and A n can be computed once and 
for all. 

We see that there is only one term on the right hand side 
in d44l i here containing d n , so that this term can be found 
once dx, d n _x have been found. This enables us to perform 
deconvolution. 

B. The proof of Theorem \3\ 
Write tr((D + V ff V)") as 

- E E 

fc ' s (rx,...,r fe ) 



5^ ri = n — k — s 



I 



tr 



\ 



\n times r 2 times s times / 

= E tr(D ri V H V---D rfc V ff V) . 

k ' s (rx,...,r k ) 
J2n = n — k 
rx > s 

Each summand here can be computed by inserting ( |411 i into 
d40l > as above. Also, the multi-indices (rx,---, r k) are easily 
traversed. It is clear that one can generalize ( f44l > to compute 
each summand (the vector K n is simply expanded to handle 



C. The proof of Theorem [4] 

( TToT l corresponds to the case where 

a = ax = {{1,2,5,6,9, 10,.. .}, {3, 4, 7, 8, 11, 12, ...}}. 

Following the notation in (f30b in Appendix |A] when tt = l 2r , 
in dl6l ). p(tt) = [0, l] 2n , so that the contribution is 



k J Pl^dx J Pl 2 {x)dx 

contributes, where k is a scalar. Also, for all other choices of 
n, the integral I n ,uj 2 does not contribute, so that the equation 
for th n'th moment uniquely determines I n>U2 , when the lower 
order integrals {Ik,uj 2 }k<n are known. Due to Lemma [3] the 
same can be said for the moments, so that it is possible to 
perform deconvolution. 

Similarly, the contribution from tt = 1„ in ( TTTb for the term 
when the second summand is always chosen is kI nyU , 2 , where 
k is a scalar. Moreover, I n>U2 contributes only for this term 
and this tt, so that the equation for th n'th moment uniquely 
determines I n . UJ2 . It follows as above that it is possible to 
perform deconvolution. 

D. The proof of Theorem [5] 

This case corresponds to a = l 2n , and 

ax = {[2, 3], [4, 5], [2n - 2, 2n - 1], [2n, 1]}. 

From d40l it is clear that we can write 



(45) 



where K n is an n x 2n matrix, depending only on the values 
computed from Fourier-Motzkin elimination. 

Deconvolution in general for (Theorem is impossible, 
since the equation system (l45T l has twice as many unknowns 
as equations. So, in this case, we need some prior knowledge 
about the phase distribution in order to perform deconvolution. 

E. The proofs of Theorem [7| and Theorem \E\ 

For Toeplitz matrices, ||29l shows that we can compute 
the moments in the same way as for Vandermonde matrices, 
but that we need only consider equations on the form (l24l ) 
with all blocks of p of cardinality two. The case of Hankel 
matrices is similar, however here the variables in (l24l > are 
placed differently on the left and right sidefl 

3 In the software described for this paper, Toeplitz, Hankel, and Vander- 
monde matrices all reuse the same code, but different sets of partitions 
are considered, depending on the type of the matrix. Also, the way the 
corresponding equation is constructed from the partition depends on the type 
of the matrix 



/ Mi \ 




( * \ 


M 2 




v 2 




= K n 




\ M n ) 




{ v 2n ) 
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Appendix C 
The proof of Theorem[6] 

Assume first that all aspect ratios are equal to c. In (25) in 
Theorem 7 in [20|, set Dj(iV) = II, and place the last matrix 



Vij in front instead to obtain 

lim^ E[tr(V il V£V i2 V? x • • • x V in V£)] 

(note that c' p l appears instead of c' p l _1 since is moved to 
front, and thus the additional c-factor is due to the fact that 
we take the trace of a matrix with different dimensions). As in 
Appendix lAl it is straightforward to generalize this to the case 
where the independent Vandermonde matrices have different 
aspect ratios, i.e. d46l i is 



p<ere!P(n) i=l 



n 



Jpncn\ 



where p n <t, is the partition consisting of the blocks of p 
contained in <Ji. Using Theorem 8 in ll2"0l (i.e. we also assume 
that the phase distributions are different, with V, having phase 
distribution u>i), we thus generalize (25) to 



<7>P 



a>p • J ° i=l i=l 

Kp^TT)^- 1 rf[(c lP ^(x))^dx 
J ° <=1 
p2ir s 

= K p>u (2n)^- 1 YUc lPux {x) + c 2 pu >1 (x)) lpl dx 

J i=i 
= (d+Cl^Kp^TT)^ 1 

X / TT ( i ( c lPvi ( x ) + c 2Pw! (%)) ) dx 

J0 i = iV C l+C2 J 

= I™ 1 ^ X 0^Ul* ai , C2 U2,Cl+C2^u 1 * cltC2 U2,Cl+C 2 ) n ]' 

where we have used d46i > on the density Cl ^ C2 ( c iP^i ( x ) + 
C2P Ul (x)). 

Appendix D 
The proof of Theorem[9] 

We will first concentrate on the proof for almost sure 
convergence for a single Vandermonde matrix, as this case is 
the simplest. This proof will follow the same lines as that of 
almost sure convergence in 11291 , in that one uses Chebyshev's 
inequality, the Borel Cantelli lemma, and the following result: 

Lemma 4: Assume that V is an ensemble of random Van- 
dermonde matrices with a continuous phase distribution, such 
that -4 — > c. For any r > 1 there exists a constant C r such 



that, for all L, 



tr ( (V H VY ) - E 



tr 



'V H V) 



< C r L~ 



(47) 

Comparing with (29], |fl~), Lemma [4] suggests that Van- 
dermonde matrices converge somewhat faster than Hankel- 



and Toeplitz matrices, but somewhat slower than Gaussian 
matrices. 

Proof: We can write 

4" 



ir ( ( V" V ! ) -Z 

= E 
-4E 



tr I (V H V) 



tr I (V H V) 



tr I (V H V) 



E 



tr I (V H V) 



+6 E 



-3 E 



tr ( (V H V) 



E 



tr((V H V)' 



tr((V H V)' 



(48) 

We use certain interval partitions to define the following 
classes of partitions in CP(4r): 

• To: partitions tt such that 

7r < {[l,r], [r + l,2r], [2r + l,3r], [3r+l,4r]}, 

• Ti^: partitions 7r ^ To such that 

vr < {[l,2r], [2r + l,3r], [3r + l,4r]}, 

• ^2, 3- partitions tt £ To such that 

tt < {[l,r], [r + l,3r], [3r + l,4r]} 

(all other Tjj are defined similarly), 

• Ti, 2 , 3 : partitions tt £ T U 3\ 2 U Ti, 3 U Ti >4 U T 2 , 3 U 
T 2 , 4 U T 3)4 such that 

tt < {[l,3r], [3r + l,4r]} 

(all other J»,j & are defined similarly), 

• Ti,2,3,4: partitions which are in none of the sets 

These classes of partitions are indexed by which intervals in 
{[l,r], [r+ l,2r], [2r+ l,3r], [3r + l,4r]} are joined (in the 
sense that at least one block in a partition tt in Ti. 2 should 
contain elements from both the first and second interval in 
{[l,r], [r+l,2r], [2r+l,3r], [3r+l,4r]}), and we can write 
T(4r) as a disjoint union: 

T(4r) = T U Ti, 2 U T li3 U Ti, 4 U T 2>3 U T 2 , 4 U T 3 , 4 

UTl, 2 ,3 U Ti,2,4 U Tl, 3 ,4 U T 2 ,3,4 

UTl,2,3,4. (49) 

We will denote the set of sets on the right hand side in d49b 
by §. Write 



Si 



E N ~ irL ^ 



(jl,--,J4r) (il,...,U r ) 
T(jl,...,j4r)=* 



xe t m ! , 

\fc=l 



,(50) 



where 



1) 7r = 7r(ji, j4r) is defined as in Appendix lAl 

2) T is a subset of {1,2,3,4}, 
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3) E T ( Xl ---x 4 ) = mii£T x i)UiETe E ( x i) ( Le - T dic- 
tates which random variables Xi are grouped within the 
same expectation), 

4) b(k) means the block of it which k belongs to 
(with town ■■■^w s independent when W-y, ...W s are the 
blocks of 7r). 

5) k — 1 is formed modulo {[l,r],[r + l,2r],[2r + 
1, 3r], [3r+l, 4r]}, meaning that the values of k — > fc— 1 
actually takes the form 



as possible, the contributions for other 7r in (fSTb is seen to be 



1 

r + 1 
2r + 1 
3r + 1 



r 

2r 
3r 
4r 



fc-1, fcg{l,r + l,2r + l,3r+l}, 



6) jV -4r are all normalizing factors in (Q]i, L -4 are the 
normalizing factors which come from taking the four 
traces for each term in ( |48T >. 

When we write out ( |48T > (by writing out the matrix product as 
in Appendix [A] we end up with sums of the form St,tt, 
with various values for T. We have in particular 



E <S{l,2,3,4},7r = E 
•7rGlP(4n) 

E = ( E 

7rey(4n) 



tr( (V H VY 



tr 



((v-v) r 



Tr e Ti, 3 

7T € ^1,4 
7T € 3> 2 ,3 
7T € ^2,4 

7T € y 3 A 

TT £ ^1,2,3 
7T 6 3>1,2,4 
7T G 3>1,3,4 
7T € 3>2,3,4 
€ ^1,2,3,4 



£{1,2}^ - 4S'{}, 7r + 6S{i. iTr - 3S'{} iT 
£{1,3},^ - 4S{}, 7r + 6S , {} jTr - 3S.Q,,,. 
S{ia},k - 45 , {}, 7r + G^}^ - 3S{} iJr 

S{2,3},ir - 4S'{2 : 3}, 7r + ®S{},tt - SS^.n 
S{2,4},k ~ 4S'{2 i 4}, 7r + 6S{y y7r - 3S{} i7r 
S{3A},t ~ 4S'{3 i 4} iT + 6S'{3 i 4} :7r - 3S , {} j7r 
^{l^^l.Tr - 45{2,3} : 7r + 6S{} i7r ~ 3S{},7r 
<5'{l,2,4},7r - 4S'{2,4} : x + S^},^ - 3S{},ir 
^{1,3,41,-tt — 45^3 4 } jTr + 6S , { 3i 4} 7r — 3S , {} i7r 
-S{2,3,4},7r - 45'{2,3,4},7r + 6S{ 3 .4}^ - 35{} j7r 
S{1,2,3A},tt - 4S'{2,3,4},7r + 6S{ 3 .4}^ - 3S{} !7r . 



Adding everything here, and using that the contributions 
from IPj^fc all are equal for different i,j,k, and that the 
contributions from CPi.j all are equal for different i, j (which is 
obvious by associating each interval [kr + 1, (fc + l)r] with r 
values on a circle, noting that the different classes of partitions 
can be viewed as different ways of connecting the circles, and 
that the actual circles being joined does not matter for the final 
value), we obtain that (08]) equals 



E (s { 



l,2,3,4},7r — 4S'{2,3,4},7r + 6S{3 ! 4} )7r — 3S{}, 



However, since only one Vandermonde matrix appears here, 
the analysis from Appendix lAl simplifies to the case o\= o = 
l4 r , for which the quantities can be expressed directly in terms 
of 7r £ 3>(4r) rather than p(n) £ T(8r) (as in Appendix ©, 
so that the notation from [ 20 1) can be followed more closely. 
As with d30j, (|48]l thus becomes 

E (^{1,2,3,4},7T - 4S'{2,3,4},7r + 6>S{3 ) 4} )W - 3S{} )1r ) 
irer(4r) 

= E E(^f 1,2, 3,41,^ ~ 4-5(2,3,41^ 

ses ires 

+65'{3 i 4} jW — 35{} )W ), (51) 

due to the ordering of the expectations in d48l i. We now 
consider all possibilities for S £ § in (fSTb . For ir £ 7q it 
is clear that one can split the expectations further to obtain 

"{l,2,3,4},ir = >5'{2,3,4},7r = iS{3,4},7r = 0{} )7r , 



i.e. we need only sum over ir £ tPi 2,3,4 ( a ll other terms 
cancel). If the phase distribution is uniform, we consider the 
coefficient matrix for the equation system corresponding to 
Tr £ 3 , i,2,3,4 (formed as in Appendix [A}. This has rank 
\ir\ — 1, so that the number of solutions (ii,...,iir) solving 
the equation system has order ^V 4 '"-kl+ 1 . Since the number 
of ji, ...jjir sucn that ■■■,ji r ) = 7r is of order O (L^l), 
<E3) is 



o 



and by adding up we see that the contribution from S £ 7q in 
( BTT l is 0. Similarly, by splitting up the expectations as much 



'^N- ir L- i L^N Ar -^+ 1 ^ = 0{L- 3 ). (52) 

This proves the claim for the uniform distribution. When the 
Vandermonde matrices do not have uniform phase distribution, 
as long as the phase distribution is continuous, we can reduce 
to the case of uniform phase distribution using Lemma |2] and 
the techniques in Appendix |A] The constant C r needs only 
to be modified by taking into account the maximum of all 
Vandermonde mixed moment expansion coefficients of order 
4r. ■ 
To prove the general case, we must in d47l i replace V ff V 
with the combination appearing in (O. One in this case instead 
considers the interval partition 

{ [1 , nr] , [nr + 1 , 2nr] , [2nr + 1 , 3nr] , [3nr + 1 , Anr] } 

instead of the interval partition {[l,r],[r + l,2r],[2r + 
l,3r],[3r + l,4r]}. The sets of partitions Vq, 7i,2, ■■■ are 
defined similarly, and they are now sets in 3 5 (4m). For a mixed 
moment as in d48l (with V ff V replaced with combinations 
as in©), one shows as before that only partitions in 7x^,3,4 
contribute (i.e. all other terms cancel as above). Since the form 
(O is used, one needs to construct the partition p(jr) £ T(8rn) 
from 7r £ y(4rn), and as in Appendix |Aj only partitions 
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satisfying d29l ) contribute (i.e. ir G B(4rn)), and d52l i becomes 
in this case 

O ^N~ 4rn L~ 4 ZyM L 4r n-\p(Tr)\+r(TT)^ 

= O (V-4''« L -4 L M i4 ™+i-M^ = (L- 3 ), 
and the result follows. 
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